Spectrum-based de novo repeat detection in genomic sequences.
Identifieur interne : 002949 ( Main/Exploration ); précédent : 002948; suivant : 002950Spectrum-based de novo repeat detection in genomic sequences.
Auteurs : Huy Hoang Do [Singapour] ; Kwok Pui Choi ; Franco P. Preparata ; Wing Kin Sung ; Louxin ZhangSource :
- Journal of computational biology : a journal of computational molecular cell biology [ 1557-8666 ] ; 2008.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
Abstract
A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg.
DOI: 10.1089/cmb.2008.0013
PubMed: 18549302
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 002103
- to stream PubMed, to step Curation: 002103
- to stream PubMed, to step Checkpoint: 001F57
- to stream Ncbi, to step Merge: 000606
- to stream Ncbi, to step Curation: 000606
- to stream Ncbi, to step Checkpoint: 000606
- to stream Main, to step Merge: 002975
- to stream Main, to step Curation: 002949
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Spectrum-based de novo repeat detection in genomic sequences.</title>
<author><name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
<affiliation wicri:level="4"><nlm:affiliation>Department of Computer Science, National University of Singapore, Singapore.</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, National University of Singapore</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
</author>
<author><name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
</author>
<author><name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
</author>
<author><name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2008">2008</date>
<idno type="RBID">pubmed:18549302</idno>
<idno type="pmid">18549302</idno>
<idno type="doi">10.1089/cmb.2008.0013</idno>
<idno type="wicri:Area/PubMed/Corpus">002103</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002103</idno>
<idno type="wicri:Area/PubMed/Curation">002103</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002103</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001F57</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001F57</idno>
<idno type="wicri:Area/Ncbi/Merge">000606</idno>
<idno type="wicri:Area/Ncbi/Curation">000606</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000606</idno>
<idno type="wicri:Area/Main/Merge">002975</idno>
<idno type="wicri:Area/Main/Curation">002949</idno>
<idno type="wicri:Area/Main/Exploration">002949</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Spectrum-based de novo repeat detection in genomic sequences.</title>
<author><name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
<affiliation wicri:level="4"><nlm:affiliation>Department of Computer Science, National University of Singapore, Singapore.</nlm:affiliation>
<country xml:lang="fr">Singapour</country>
<wicri:regionArea>Department of Computer Science, National University of Singapore</wicri:regionArea>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
</author>
<author><name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
</author>
<author><name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
</author>
<author><name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</author>
</analytic>
<series><title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint><date when="2008" type="published">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Pattern Recognition, Automated</term>
<term>Probability</term>
<term>Repetitive Sequences, Nucleic Acid</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génome humain</term>
<term>Humains</term>
<term>Probabilité</term>
<term>Reconnaissance automatique des formes</term>
<term>Séquences répétées d'acides nucléiques</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human</term>
<term>Humans</term>
<term>Pattern Recognition, Automated</term>
<term>Probability</term>
<term>Repetitive Sequences, Nucleic Acid</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génome humain</term>
<term>Humains</term>
<term>Probabilité</term>
<term>Reconnaissance automatique des formes</term>
<term>Séquences répétées d'acides nucléiques</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A novel approach to the detection of genomic repeats is presented in this paper. The technique, dubbed SAGRI (Spectrum Assisted Genomic Repeat Identifier), is based on the spectrum (set of sequence k-mers, for some k) of the genomic sequence. Specifically, the genome is scanned twice. The first scan (FindHit) detects candidate pairs of repeat-segments, by effectively reconstructing portions of the Euler path of the (k-1)-mer graph of the genome only in correspondence with likely repeat sites. This process produces candidate repeat pairs, for which the location of the leftmost term is unknown. Candidate pairs are then subjected to validation in a second scan, in which the genome is labelled for hits in the (much smaller) spectrum of the repeat candidates: high hit density is taken as evidence of the location of the first segment of a repeat, and the pair of segments is then certified by pairwise alignment. The design parameters of the technique are selected on the basis of a careful probabilistic analysis (based on random sequences). SAGRI is compared with three leading repeat-finding tools on both synthetic and natural DNA sequences, and found to be uniformly superior in versatility (ability to detect repeats of different lengths) and accuracy (the central goal of repeat finding), while being quite competitive in speed. An executable program can be downloaded at http://sagri.comp.nus.edu.sg.</div>
</front>
</TEI>
<affiliations><list><country><li>Singapour</li>
</country>
<orgName><li>Université nationale de Singapour</li>
</orgName>
</list>
<tree><noCountry><name sortKey="Choi, Kwok Pui" sort="Choi, Kwok Pui" uniqKey="Choi K" first="Kwok Pui" last="Choi">Kwok Pui Choi</name>
<name sortKey="Preparata, Franco P" sort="Preparata, Franco P" uniqKey="Preparata F" first="Franco P" last="Preparata">Franco P. Preparata</name>
<name sortKey="Sung, Wing Kin" sort="Sung, Wing Kin" uniqKey="Sung W" first="Wing Kin" last="Sung">Wing Kin Sung</name>
<name sortKey="Zhang, Louxin" sort="Zhang, Louxin" uniqKey="Zhang L" first="Louxin" last="Zhang">Louxin Zhang</name>
</noCountry>
<country name="Singapour"><noRegion><name sortKey="Do, Huy Hoang" sort="Do, Huy Hoang" uniqKey="Do H" first="Huy Hoang" last="Do">Huy Hoang Do</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002949 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002949 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:18549302 |texte= Spectrum-based de novo repeat detection in genomic sequences. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:18549302" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |